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Abstract 

We show that any depth-d circuit for determining whether an n-node graph has an s-to-t path 
of length at most k must have size The previous best circuit size lower bounds for 

this problem were (due to Beame, Impagliazzo, and Pitassi [BIP98]) and 

(following from a recent formula size lower bound of Rossman [Rosl4]). Our lower bound is 
quite close to optimal, since a simple construction gives depth-d circuits of size for this 

problem (and strengthening our bound even to ^ ^ ' would require proving that undirected 
connectivity is not in NC^.) 

Our proof is by reduction to a new lower bound on the size of small-depth circuits computing 
a skewed variant of the “Sipser functions” that have played an important role in classical circuit 
lower bounds [Sip83, Yao85, Has86]. A key ingredient in our proof of the required lower bound for 
these Sipser-like functions is the use of random projections, an extension of random restrictions 
which were recently employed in [RST15]. Random projections allow us to obtain sharper 
quantitative bounds while employing simpler arguments, both conceptually and technically, 
than in the previous works [Ajt89, BPU92, BIP98, Rosl4]. 
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1 Introduction 


Graph connectivity problems are of great interest in theoretical computer science, both from an 
algorithmic and a computational complexity perspective. The “st-connectivity,” or STCONN, 
problem — given an n-node graph G with two distinguished vertices s and t, is there a path of 
edges from s to t? — plays a particularly central role. One longstanding question is whether 
any improvement is possible on Savitch’s 0((logn)^)-space algorithm [Sav70], based on “repeated 
squaring,” for the directed STCONN problem; since this problem is complete for NL, any such 
improvement would show that NL C SPACE(o(log^ n)), and hence would have a profound impact 
on our understanding of non-deterministic space complexity. Wigderson’s survey [Wig92] provides 
a now somewhat old, but still very useful, overview of early results on connectivity problems. 

In this paper we consider the “small distance connectivity” problem STCONN{k{n)) which is 
defined as follows. The input is the adjacency matrix of an undirected n-vertex graph G which has 
two distinguished vertices s and t, and the problem is to determine whether G contains a path of 
length at most /c(n) from s to t. We study this problem from the perspective of small-depth circuit 
complexity; for a given depth d (which may depend on k), we are interested in the size of unbounded 
fan-in depth-d circuits of AND, OR, and NOT gates that compute STCONN{k{n)). (As several 
authors [BIP98, Rosl4] have observed, the directed and undirected versions of the STCONN{k{n)) 
problems are essentially equivalent via a simple reduction that converts a directed graph into a 
layered undirected graph; for simplicity we focus on the undirected problem in this paper.) 

An impetus for this study comes from the above-mentioned question about Savitch’s algorithm. 
As noted by Wigderson [Wig92], a simple reduction shows that if Savitch’s algorithm is optimal, 
then for all k polynomial-size unbounded fan-in circuits for STCONN{k{n)) must have depth 
D(logA:). By giving lower bounds on the size of small-depth circuits for STCONN{k{n)), Beame, 
Impagliazzo, and Pitassi [BIP98] have shown that depth D(loglogA;) is required for k{n) < logn, 
and more recently Rossman [Rosl4] has shown that depth D(logA:) is required for k{n) < log logn. 
These bounds for restricted ranges of k motivate further study of the circuit complexity of small- 
depth circuits for STCONN{k{n)). Below we give a more thorough discussion of both upper and 
lower bounds for this problem, before presenting our new results. 

1.1 Prior results 

Upper bounds (folklore). A natural approach to obtain efficient circuits for STCONN{k{n)) is 
by repeated squaring of the input adjacency matrix. If Xij is the input variable that takes value 
1 if edge {i,j} is present in the input graph, then the graph contains a path of length at most 2 
from i to j if and only if the depth-2 circuit Vfc=i(^i,fc Cxkj) is satisfied (assuming that Xi^i = 1 for 
every i). Iterating this construction yields a circuit of size poly(n) and depth 21og/c that computes 
STCONN{k{n)), whenever A: is a power of two. 

For smaller depths, a natural extension of this approach leads to the following construction. 
Let Go be the input graph. For every pair of nodes u, u in Go, check by exhaustive search for paths 
of length at most t = k^^'^ connecting these nodes. (We assume that k^^'^ is an integer in order 
to avoid unnecessary technical details.) Note that this can be done simultaneously for every pair 
of nodes by a (multi-output) depth-2 OR-of-ANDs circuit of size Let Gi be a new graph 

that has an edge between u and v if and only if a path of length at most t connects these nodes. 
In general, if we start with Go and repeat this procedure d times, we obtain a sequence of graphs 
Go, Gi,..., Grf for which the following holds: Gi has an edge between nodes u and v if and only 
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if they are connected by a path of length at most in the initial graph Gq. In particular, this 

construction provides a circuit of depth 2d and size computes STCONN{k{n)). 

Summarizing this discussion, it follows that for all /c < n and d < log/c, STCONN[k{n)) can be 
computed by depth-2(i circuits of size or equivalently by depth-d circuits of size 

dpf 

Lower bounds. Furst, Saxe, and Sipser [FSS84] were the first to show that STCONN = 
STCONN{n) ^ AC° via a reduction from their lower bound against small-depth circuits computing 
the parity function. By the same reduction, Hastad’s subsequent optimal lower bound against 
parity [Has86] implies that depth-d circuits computing STCONN(k{n)) must have size 2^^^^ (‘^+i)). 
in particular, for k{n) = (logn)'^^^^ polynomial-size circuits computing STCONN{k{n)) must have 
depth d = n(log/c/loglogn). Note, however, that this is not a useful bound for small distance 
connectivity, since when k{n) = o(logn) the lower bound is less than n and hence trivial. 

Ajtai [Ajt89] was the first to show that STCONN{k{n)) ^ AC° for all k{n) = ojniX)] however, his 
proof did not yield an explicit circuit size lower bound. His approach was further analyzed and sim¬ 
plified by Bellantoni, Pitassi, and Urquhart [BPU92], who showed that this technique gives a (barely 
super-polynomial) lower bound on the size of depth-d circuits for STCONN{k{n)), 

where log*-®^ denotes the i-times iterated logarithm. This implies that polynomial-size circuits com¬ 
puting STCONN{k{n)) must have depth H(log* k). 

Beame, Impagliazzo, and Pitassi [BIP98] gave a significant quantitative strengthening of Ajtai’s 
result in the regime where k{n) is not too large. For k{n) < logn, they showed that any depth-d 

circuit for STCONN{k{n)) must have size where 4> = (\/5 -|- l)/2 is the golden ratio. 

Their arguments are based on a special-purpose “connectivity switching lemma” that they develop, 
which combines elements of both the Ajtai [Ajt83] “independent set style” switching lemma and 
the later approach to switching lemmas given by Yao [Yao85] , Hastad [Has86] and Cai [Cai86] . 

Observe that the [BIP98] lower bound shows that polynomial-size circuits for STCONN{k{n)) 
require depth H(log log k) (and as noted above, the [BIP98] lower bound only holds for k{n) < log n). 
Beame et al. asked whether this H(loglogA:) could be improved to H(logfc), which is optimal by the 
upper bound sketched above. This was achieved recently by Rossman [Rosl4], who showed that 
for k{n) < log logn, polynomial-size circuits for STCONN {k{n)) require depth H(logA:). In more 
detail, he showed that for k{n) < log logn and d(n) < logn/(loglogn)*^(^\ depth-d formulas for 
STCONN [kin]) require size By the trivial relation between formulas and circuits (every 

circuit of size S and depth d is computed by a formula of size and depth d), this implies that 
for such k{n) and d(n), depth-d circuits for STCONN{k{n)) require size While this 

answers the question of Beame et ah, the /d) circuit size bound that follows from Rossman’s 

formula size bound is significantly smaller than the circuit size bound of [BIP98] when 

d is small. Furthermore, Rossman’s result only holds for k{n) < log logn whereas [BIP98]’s holds 
for k{n) < logn (and ideally we would like a lower bound for all distances k{n) < n). 

1.2 Our results 

Our main result is a near-optimal lower bound for the small-depth circuit size of STCONN[k{n)) 
for all distances k{n) < n. We prove the following: 

Theorem 1. For any k{n) < and any d = d(n), any depth-d circuit computing STCONN {k{n)) 
must have size ^ Furthermore, for any k(n) < n and any d = d{n), any depth-d circuit 

computing STCONN{k{n)) must have size /d)^ 
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Circuit size 

Depth of poly-size circuits 

Range of /c’s 

Implicit in [Has86] 


D(logA:/log logn) 

All k 

[Ajt89, BPU92] 

^n(iog(‘*+3) k) 

D(log* k) 

All k 

[BIP98] 


D(log log A:) 

k < logn 

[Rosl4] 

^n{{logk)/d) 

D(log A:) 

k < log log n 

Folklore upper bound 

^Oik^'i) 

2 log A: 

All k 

This work 


D(logA;/log log A;) 

k < 

All k 


Table 1: Previous work and our results on the size of depth-d circuits for STCONN{k{n)) . The 
column “Range of fe’s” indicates the values of k for which the lower bound is proved to hold. 


Our lower bound is very close to the best possible, given the upper bound. Indeed, 

strengthening our theorem to for all values of k and d would imply a breakthrough in circuit 

complexity, showing that unbounded fan-in circuits of depth o(logn) computing STCONN must 
have super-polynomial size. Since every function in NC^ can be computed by unbounded fan-in 
circuits of polynomial size and depth 0(logn/loglogn) (see e.g. [KPPY84]), such a strengthening 
would yield an unconditional proof that STCONN ^ NC^. 

Comparing to previous work, our bound subsumes the main lower 

bound result of Beame et al. [BIP98] for all depths d, and improves the circuit size lower 

bound that follows from Rossman’s formula size lower bound [Rosl4] except when d is quite close 
to log A; (specifically, except when n(log A;/loglog fe) < d < 0{logk)). For large distances k{n) for 
which the results of [BIP98, Rosl4] do not apply (i.e. k{n) = a;(logn)), our lower bound subsumes 
the lower bound that is implied by [Has86] for all distances k{n) < and depths d, 

and it subsumes the subsequent rpP^^' lower bound of [Ajt89, BPU92] for all distances k and 
depths d. 

Another perspective on Theorem 1 is that it implies that polynomial-size circuits require depth 
fI(log A:/loglog A:) to compute STCONN(k{n)) for all distances k{n) < n. While Rossman’s results 
give h2(logA:), they hold only for the signihcantly restricted range k{n) < loglogn. (And indeed, as 
noted above a lower bound of n(logA:) for all k{n) would imply that STCONN ^ NC^.) 

1.3 Our approach 

Previous state-of-the-art results on this problem employed rather sophisticated arguments and 
involved machinery. Beame et al. [BIP98] (as well as the earlier works of [Ajt89, BPU92]) obtained 
their lower bounds by considering the STCONN(k{n)) problem on layered graphs of permutations, 
i.e., graphs with A: -|- 1 layers of n vertices per layer in which the induced graph between adjacent 
layers is a perfect bipartite matching. They developed a special-purpose “connectivity switching 
lemma” that bounds the depth of specialized decision trees for randomly-restricted layered graphs. 
Rossman [Rosl4] considered random subgraphs of the “complete A:-layered graph” (with A:-|-l layers 
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of n vertices and k'n? edges) where each edge is independently present with probability 1/n. At the 
heart of his proof is an intricate notion of “pathset complexity,” which roughly speaking measures 
the minimum cost of constructing a set of paths via the operations of union and relational join, 
subject to certain “density constraints.” 

In contrast, we feel that our approach is both conceptually and technically simple. Instead of 
working with layered permutation graphs or random subgraphs of the complete layered graph, we 
consider a class of series-parallel graphs that are obtained in a straightforward way (see Section 3) 
from a skewed variant of the “Sipser functions” that have played an important role in the classical 
circuit lower bounds of Sipser [Sip83], Yao [Yao85], and Hastad [Has86]. Briefly, for every d G N, 
the d-th Sipser function Sipser^ is a read-once monotone formula with d alternating layers of AND 
and OR gates of fan-in w, where u) G N is an asymptotic parameter that tends to oo (and so Sipser^ 
computes an n = w'^ variable function). Building on the work of Sipser and Yao, Hastad used the 
Sipser functions^ to prove an optimal depth-hierarchy theorem for circuits, showing that for every 
d G N, any depth-d circuit computing Sipser^_,_^ must have size exp(n^(^/'^^). 

The skewed variant of the Sipser functions that we use to prove our near-optimal lower bounds 
for STCONN{k{n)) is as follows. For every d G N and 2 < u < w, the d-th u-skewed Sipser 
function, denoted SkewedSipser.^ is essentially Sipser 2 ^_|_i but with the AND gates having fan-in 
u rather than w (see Section 3 for a precise definition; as we will see, the number of levels of AND 
gates is the key parameter for SkewedSipser, which is why we write SkewedSipser^ ^ to denote the 
n-variable formula that has d levels of AND gates and d -|- 1 levels of OR gates.) Via a simple 
reduction given in Section 3, we show that to get lower bounds for depth-d circuits computing 
STCONN{u^) on n-node graphs, it suffices to prove that depth-d circuits for SkewedSipser^^^ must 
have large size. Under this reduction the fan-in of the AND gates is directly related to the length of 
(potential) paths between s and t. This is why we must use a skewed variant of the Sipser function 
in order to obtain lower bounds for small distance connectivity. We remark that even the case 
u = 2 is interesting and can be used to get the ijound of Theorem 1 for k up to 

roughly Allowing a range of values for u enables us to get the lower bound for k up to 

(as stated in Theorem 1). 

Our main technical result of the paper is a lower bound for SkewedSipser^ a formula of depth 
2d -|- 1 over n = n{u, w, d) = variables (for technical reasons we use a smaller fan-in for 

the first layer of OR gates next to the inputs). 

Theorem 2. Let d{w) > 1 and 2 < u{w) < u;33/ioo^ where w — )> oo. Then any depth-d circuit 
computing SkewedSipser.,^ has size at least _ 

Observe that setting u = k^^'^ this size lower bound is ^ therefore we indeed obtain 

the lower bound for STCONN{k{n)) stated in Theorem 1 as a corollary. As we point out in Section 
6 (Remark 18), the lower bound given in Theorem 2 for SkewedSipser is essentially optimal. 

Though they are superficially similar, Theorem 2 and Hastad’s depth hierarchy theorem differ 
in two important respects. Both result from our goal of using Theorem 2 to get lower bounds for 
small distance connectivity, and both pose significant challenges in extending Hastad’s proof: 

1. Hastad showed that depth-d unbounded fan-in circuits require large size to compute a single 
highly symmetric “hard function,” namely Sipser,^_|_]^. In contrast, toward our goal of under¬ 
standing the depth-d circuit size of STCONN{k{n)) for all values of k = k{n) and d = d{n), 

^The exact definition of the function used in [Has86] differs slightly from our description for some technical reasons 
which are not important here. 
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we seek lower bounds on the size of depth-d unbounded fan-in circuits computing any one 
of a spectrum of asymmetric hard functions, namely SkewedSipser^ ^ for all u := (with 
stronger quantitative bounds as k and u get larger). 

2. To get the strongest possible result in his depth hierarchy theorem, Hastad (like Yao and 
Sipser) was primarily focused on lower bounding the size of circuits of depth exactly one less 
than Sipser^^^. In contrast, since in our framework our goal is to lower bound the size of 
depth-d circuits computing SkewedSipser.^ ^ (corresponding to STCONN{k{n)) with k = u'^) 
which has depth 2d + 1, we are interested in the size of circuits of depth (roughly) half that 
of our hard function SkewedSipser^ 

In Section 2 we recall the high-level structure of Hastad’s proof of his depth hierarchy theorem 
(based on the method of random restrictions), highlight the issues that arise due to each of the two 
differences above, and describe how our techniques — specifically, the method of random projections 
— allow us to prove Theorem 2 in a clean and simple manner. 


2 Hastad’s depth hierarchy theorem, random projections, and 
proof outline of Theorem 2 

Hastad’s depth hierarchy theorem and its proof. Recall that Hastad’s depth hierarchy 
theorem shows that Sipser^_,_^ cannot be computed by any circuit C of depth d and size exp(n‘^^^/'’*^). 
The main idea is to design a sequence of random restrictions {'R-i} 2 <i<d satisfying two competing 
requirements; 

• Circuit C collapses. The randomly restricted circuit C \ ■ ■ ■ p^'^\ where p^^'l ^ TZi for 

2 < i < d, collapses to a “simple function” with high probability. This is shown via iterative 
applications of a switching lemma for the 7^^’s, where each application shows that with high 
probability a random restriction p^^'i decreases the depth of the circuit C \ p^^'^ ■ ■ ■ by 

at least one. The upshot is that while C is a size-S depth-d circuit, C \ p^^'^ ■ ■ ■ collapses 
to a small-depth decision tree (i.e. a “simple function”) with high probability. 

• Hard function Sipser^^^ retains structure. In contrast with the circuit C, the hard 
function Sipser^^^ is “resilient” against the random restrictions p^^^ •(— TZi- In particular, 
each random restriction p^^'l simplifies Sipser only by one layer, and so Sipser^_)_^ \ p^‘^'1 ■ ■ ■ p^^'l 
contains Sipser^ as a subfunction with high probability. Therefore, with high probability. 
Sipser^.,.]^ ( p^^'l ■ ■ ■ p^‘^'> still contains Sipser 2 as a subfunction, and hence is a “well-structured 
function” which cannot be computed by a small-depth decision tree. 

We remind the reader that to satisfy these competing demands, the random restrictions {TZf} de¬ 
vised by Hastad specifically for his depth hierarchy theorem are not the “usual” random restrictions 
where each coordinate is independently kept alive with probability p G (0,1), and set to a uniform 
bit otherwise (it is not hard to see that Sipser does not retain structure under these random re¬ 
strictions). Likewise, the switching lemma for the TZfs is not the “standard” switching lemma 
(which Hastad used to obtain his optimal lower bounds against the parity function). Instead, at 
the heart of Hastad’s proof are new random restrictions {'R-^} 2 <i<d designed to satisfy both require¬ 
ments above: the coordinates of TZi are carefully correlated so that Sipser^.,.]^ retains structure, and 
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Hastad proved a special-purpose switching lemma showing that C collapses under these carefully 
tailored new random restrictions. 

Issues that arise in our setting. At a technical level (related to point (1) described at 
the end of Section 1), Hastad’s special-purpose switching lemma is not useful for analyzing our 
SkewedSipser^ ^ formulas for most values of u = of interest, since they have a “fine structure” 
that is destroyed by his too-powerful random restrictions. His switching lemma establishes that any 
DNF of width collapses to a small-depth decision tree with high probability when it is hit by 

a random restriction ^ TZi. Observe that his hard function Sipser^_|_i has DNF-width n(-y/n), so 
his switching lemma does not apply to it (and indeed as discussed above, hitting Sipser^.,.]^ with his 
random restriction results in a well-structured function that still contains Sipser^ as a subfunction 
with high probability). In contrast, in our setting the hard function SkewedSipser^ ^ has d levels 
of AND gates of fan-in u, and in particular, can be written as a DNF of width P = k. So for all 
k = k{n) and d = d{n) such that k <C (indeed, this holds for most values of k and d of 

interest), the relevant hard function SkewedSipser^ ^ collapses to a small-depth decision tree after 
a single application of Hastad’s random restriction. 

Next (related to point (2)), recall that the formula computing Hastad’s hard function Sipser^^^ 
has a highly regular structure where the fan-ins of all gates — both AND’s and OR’s — are the same. 
As discussed above, Hastad employs a random restriction which (with high probability) “peels off” 
a single layer of Sipser^_|_^ and results in a function that contains Sipser^ as a subfunction. Due 
to their regular structures, Sipser^ is dual to Sipser^^^ (more precisely, the bottom-layer depth-2 
subcircuits of Sipser^ are dual to those of Sipser^_,_i), and this allows Hastad to repeat the same 
procedure d — 1 times. In contrast, in our setting we are dealing with the highly asymmetric 
SkewedSipser^ ^ formulas where the fan-ins of the AND gates are much less than those of the 
OR gates. Therefore, in order to reduce to a smaller instance of the same problem, our setup 
requires that we peel off two layers of SkewedSipser^ ^ at a time rather than just one as in Hastad’s 
argument. To put it another way, while Hastad’s switching lemma uses a single layer of his hard 
function Sipser^_|_i (i.e. disjoint copies of OR’s/AND’s of fan-in w) to “trade for” one layer of 
depth reduction in C, our switching lemma will use two layers of our hard function SkewedSipser.,^ ^ 
(i.e. disjoint copies of read-once CNF’s with u = clauses of width w) to trade for one layer of 
depth reduction in C. 

Our approach: random projections. A key technical ingredient in Hastad’s proof of his 
depth hierarchy theorem — and indeed, in the works of [BIP98, RosI4] on STCONN(k{n)) as well 
— is the method of random restrictions. In particular, they all employ switching lemmas which 
show that a randomly-restricted small-width DNF collapses to a small-depth decision tree with 
high probability: as mentioned above, Hastad proved a special-purpose switching lemma for random 
restrictions tailored for the Sipser functions, while Beame et al. developed a “connectivity switching 
lemma” for random restrictions of layered permutation graphs, and Rossman used Hastad’s “usual” 
switching lemma in conjunction with his pathset complexity machinery. 

In this paper we work with random projections, a generalization of random restrictions. Given a 
set of formal variables X = {xi ,..., Xn}, a restriction p either fixes a variable Xi (i.e. p(xj) G {0,1}) 
or keeps it alive (i.e. p{xi) = Xi, often denoted by *). A projection, on the other hand, either fixes 
Xi or maps it to a variable yj from a possibly different space of formal variables y = {yi, ...,ym}- 
Restrictions are therefore a special case of projections where y = X, and each Xi can only be 
fixed or mapped to itself. (See Section 4 for precise definitions.) Our arguments crucially employ 
projections in which y is smaller than X, and where moreover each x* is only mapped to a specific 
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element yj where j depends on z in a carefully designed way that depends on the structure of the 
formula computing the SkewedSipser function. Such “collisions”, where multiple formal variables 
in X are mapped to the same new formal variable yj G 3^, play an important role in our approach. 

Random projections were used in the recent work of Rossman, Servedio, and Tan [RST15], where 
they are the key ingredient enabling that paper’s average-case extension of Hastad’s worst-case 
depth hierarchy theorem. In earlier work, Impagliazzo, Paturi, and Saks [IPS97] used random pro¬ 
jections to obtain size-depth tradeoffs for threshold circuits, and Impagliazzo and Segerlind [ISOl] 
used them to establish lower bounds against constant-depth Frege systems in proof complexity. 
Our work provides further evidence for the usefulness of random projections in obtaining strong 
lower bounds: random projections allow us to obtain sharper quantitative bounds while employing 
simpler arguments, both conceptually and technically, than in the previous works [Ajt89, BPU92, 
BIP98, Rosl4] on the small-depth complexity of STCONN{k{n)). 

We remark that although [RST15] and this work both employ random projections to reason 
about the Sipser function (and its skewed variants), the main advantage offered by projections over 
restrictions are different in the two proofs. In [RST15] the overarching challenge was to establish 
average-case hardness, and the identification of variables was key to obtaining uniform-distribution 
correlation bounds from the composition of highly-correlated random projections. As outlined 
above, in this work a significant challenge stems from our goal of understanding the depth-d circuit 
size of STCONN{k{n)) for all values ofk = k{n) and d = d{n). The added expressiveness of random 
projections over random restrictions is exploited both in the proof of our projection switching lemma 
(see Section 2.1 below) and in the arguments establishing that our SkewedSipser^ ^ functions “retain 
structure” under our random projections. 

2.1 Proof outline of Theorem 2 

Our approach shares the same high-level structure as Hastad’s depth hierarchy theorem, and is 
based on a sequence ^ of d — 1 random projections satisfying two competing requirements (it 
will be more natural for us to present them in the opposite order from our discussion of Hastad’s 
theorem in the previous section): 

• Hard function SkewedSipser retains structure. Our random projections are defined with 
the hard function SkewedSipser in mind, and are carefully designed so as to ensure that 
SkewedSipser.^^ “retains structure” with high probability under their composition T'. 

In more detail, each of the d — 1 individual random projections comprising peels off two 
layers of SkewedSipser, and a randomly projected SkewedSipser^^ contains SkewedSipser^ 
as a subfunction with high probability. These individual random projections are simple to 
describe: each bottom-layer depth-2 subcircuit of SkewedSipser^ ^ (a read-once CNF with 
u = k^/'^ clauses of width w) independently “survives” with probability q G (0,1) and is 
“killed” with probability — q (where g is a parameter of the restrictions), and 

— if it survives, all uw variables in the CNF are projected to the same fresh formal 
variable (with different CNFs mapped to different formal variables); 

— if it is killed, all its variables are fixed according to a random 0-assignment of the CNF 
chosen uniformly from a particular set of 2u many 0-assignments. 

In other words, each bottom-layer depth-2 subcircuit independently simplifies to a fresh formal 
variable (with probability q) or the constant 0 (with probability 1 — q). With the appropriate 
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definition of SkewedSipser and choice of q, it is easy to verify that indeed a randomly projected 
SkewedSipser^ £ contains SkewedSipser^ as a subfunction with high probability. (For this 
to happen, the fanin of the bottom OR gates of SkewedSipser is chosen to be moderately 
smaller than w, the fanin of all other OR gates in SkewedSipser; see Dehnition 3 for details.) 

• Circuit C collapses. In contrast with SkewedSipser^^, any depth-d circuit C of size 

collapses to a small-depth decision tree under 'J' with high probability. Following the standard 
“bottom-up” approach to proving lower bounds against small-depth circuits, we establish this 
by arguing that each of the individual random projections comprising ^ “contributes to the 
simplification” of C by reducing its depth by (at least) one. 

More precisely, in Section 5 we prove a projection switching lemma, showing that a small- 
width DNF or CNF “switches” to a small-depth decision tree with high probability under 
our random projections. (The depth reduction of C follows by applying this lemma to every 
one of its bottom-level depth-2 subcircuits.) Recall that the random projection of a depth-2 
circuit over a set of formal variables X yields a function over a new set of formal variables y, 
and in our case y is significantly smaller than X. In addition to the structural simplification 
that results from setting variables to constants (as in the switching lemmas of [Has86, BIP98, 
Rosl4] for random restrictions), the proof of our projection switching lemma also exploits the 
additional structural simplihcation that results from distinct variables in X being mapped to 
the same variable in y. 

2.2 Preliminaries 

A restriction over a finite set of variables A is an element of {0,1, We define the composition 
pp' of two restrictions p, p' G {0,1, over a set of variables A to be the restriction 

(pp')a =' ^ foralluG A 

I p'^ otherwise 

A DNF is an OR of ANDs (terms) and a CNF is an AND of ORs (clauses). The width of a 
DNF (respectively, CNF) is the maximum number of variables that occur in any one of its terms 
(respectively, clauses). 

The size of a circuit is its number of gates, and the depth of a circuit is the length of its longest 
root-to-leaf path. We count input variables as gates of a circuit (so any circuit for a function that 
depends on all n input variables trivially has size at least n). We will assume throughout the paper 
that circuits are alternating, meaning that every root-to-leaf path alternates between AND gates 
and OR gates. We also assume that circuits are layered, meaning that for every gate G, every root- 
to-G path has the same length. These assumptions are without loss of generality as by a standard 
conversion (see e.g. the discussion at [Sta]), every depth-d size-S' circuit is equivalent to a depth-d 
alternating layered circuit of size at most poly(S) (this polynomial increase is offset by the “D(-)” 
notation in the exponent of all of our theorem statements.) 


3 Lower bounds against SkewedSipser yield lower bounds for small 
distance connectivity 

In this section we define SkewedSipser^ ^ and show that computing this formula on a particular input 
z is equivalent to solving small-distance connectivity on a certain undirected (multi)graph G{z). 
In a bit more detail, every input z corresponds to a subgraph G{z) of a fixed ground graph G that 
depends only on SkewedSipser^ (Jumping ahead, we associate each input bit of SkewedSipser^ ^ 
with an edge of its corresponding ground graph G.) Roughly speaking, AND gates translate into 
sequential paths, while OR gates correspond to parallel paths. After defining SkewedSipser^ ^ and 
describing this reduction, we give the proof of Theorem 1, assuming Theorem 2. 

The SkewedSipser formula is defined in terms of an integer parameter w] in all our results this 
is an asymptotic parameter that approaches -|-oo, and so w should be thought of as “sufficiently 
large” throughout the paper. 

Definition 3. For 2 < u < w and d > 0, SkewedSipser.,^ ^ is the Boolean function computed by the 
following monotone read-once formula: 

• There are 2d -|- 1 alternating layers of OR and AND gates, where the top and bottom-layer 
gates are OR gates. (So there are d -|- 1 layers of OR gates and d layers of AND gates.) 

• AND gates all have fan-in u. 

• OR gates all have fan-in w, except bottom-layer OR gates which have fan-in y;33/ioo. 

assume that is an integer throughout the paper. (The most important thing about the 

constant 33/100 in the above definition is that it is less than 1; the particular value 33/100 
was chosen for technical reasons so that we could get the constant 5 in Theorem 1.) 

Consequently, SkewedSipser,^,^ is a Boolean function over n = variables in total. 

Prom SkewedSipser,,,^ to small-distance connectivity. There is a natural correspondence be¬ 
tween read-once monotone Boolean formulas and series-parallel multigraphs in which each graph 
has a special designated “start” node s and a special designated “end” node t. We now describe 
this correspondence via the inductive structure of read-once monotone Boolean formulas. As we 
shall see, under this correspondence there is a bijection between the variables of a formula / and 
the edges of the graph G{f). 

• If f{x) = X is a single variable, then the graph G{f) has vertex set V{f) = {s,t} and edge 
set E{f) consisting of a single edge {s,f}. 

• Let /i,..., /m be read-once monotone Boolean formulas over disjoint sets of variables, where 
G{fi) is the (multi)graph associated with fi and Si,ti are the start and end nodes of G{fi). 

— If / = AND(/i,..., fm)'- The graph G{f) is obtained by identifying ti with S2, t2 with 
S 3 , ..., and tm-i with Sm- The start node of G{f) is si and the end node is tm- Thus 
the vertex set V (/) is V (/i) U • • • U P {fm) \ • • • j tm-i} and the edge set E{f) is the 

multiset E'{fi) U • • • U E'{fm), where each E'{fi) is obtained from E{fi) by renaming 
the appropriate vertices. 
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Figure 1: A read-once formula / (on the left), which is a fan-in 4 OR of fan-in 3 ANDs of fan-in 2 ORs of 
fan-in 2 ANDs, and the corresponding graph G{f) (on the right). 


— If / = OR(/i,..., fm)- The graph G{f) is obtained by identifying si,..., Sm ah to a 
new start vertex s and ti,... ,tm all to a new end vertex t. Thus the vertex set V (/) is 
V{h)U---UV{U)U{s,t}\{si ti,..., and the edge set is the multiset 

£''(/i) U • • • U E'{fm), where again each E'{fi) is obtained from the corresponding edge 
set E{fi) by renaming vertices accordingly. 

Since / is read-once, the number of edges of G{f) is precisely the number of variables of /, and there 
is a natural correspondence between edges and variables. Figure 1 provides a concrete example of 
this construction. 

Remark 4. We note that if / is a read-once monotone Boolean formula in which the bottom-level 
gates are AND gates and have fan-in at least two, then G{f) is a simple graph and not a multigraph. 

A simple inductive argument gives the following: 

Observation 5. If f is a read-once monotone alternating formula with r layers o/AND gates of 
fan-ins oi, ... ,ar, respectively, then every shortest path from s to t in the graph G{f) has length 
exactly ai---ar. Furthermore, if H is a subgraph of G{f) that contains some s-to-t path, then it 
contains a path of length oi • • • a,.. 

As a corollary, we have: 

Observation 6. Every shortest path from s to t in G(SkewedSipser.^ ^) has length exactly 

Given a read-once monotone formula / over variables xi,... ,Xn and an assignment z G {0,1}” 
to the variables xi,...,x„, we define the graph G{f,z) to be the (spanning) subgraph of G{f) 
which has vertex set V{f,z) = V{f) and edge set E{f,z) defined as follows: each edge in E{f) is 
present in E{f,z) if and only if the corresponding coordinate of z is set to 1. A simple inductive 
argument gives the following: 

Observation 7. Given a read-once monotone alternating formula with r layers of AND gates of 
fan-ins oi, ..., respectively, and an assignment z G {0,1}”, the graph G{f, z) contains a path 
from s to t of length oi • • • a,, if and only if f{z) = 1. 

From these observations we obtain the following connection between SkewedSipser^ ^ and small- 
distance connectivity, which is key to our lower bound: 

Corollary 3.1. The multigraph G(SkewedSipser.^ z) contains an s-to-t path of length at most u‘^ 
if and only f/SkewedSipser^ ^(z) = 1. 


10 








Note that Corollary 3.1 and Theorem 2 together can be used to prove lower bounds for small- 
distance connectivity on multigraphs. One way to obtain lower bounds for simple graphs instead 
of multigraphs is by extending SkewedSipser^ ^ with an extra layer of fan-in two AND gates next 
to the input variables, then relying on Remark 4. We use this simple observation and Theorem 2 
to establish Theorem 1. 

Theorem 1. For any k{n) < and any d = d{n), any depth-d circuit computing STCONN(k{n)) 
must have size _ Furthermore, for any k{n) < n and any d = d{n), any depth-d circuit 

computing STCONN{k{n)) must have size ^ 

Proof. We assume that d < 2 log/c/log log/c and (A:/2)^/'^ > 2 (observe that the claimed bound is 
trivial if d > 2 log A:/log log A: or (A:/2)^/'^ < 2). Let 


Uo 


(k/2fl^ 


Then we have uq > 2 and uq = VLifk}/^). For convenience, let 

ko = Uq < kj2 and n = — 

Further, let rco be the largest positive integer such that 

< n'. 

Observe that, since k < and d < 2 log A;/log log A:, as n —)> -|-oo we have similarly wq 
Our choice of wq also implies that wq satisfies 

rio(u;o + l)'^+33/ioo ^ 

Let no Then from d < 2 log A:/log log A: and wq —)■ -|-oo we have 

no > no f 


( 1 ) 

( 2 ) 

-|-oo. 


Combining this with k < n}!^ and A;o < k/2 we have that ko = and no > k^'^ when n is 

sufficiently large. 

We define a variant of our SkewedSipser^ ^ formula so we can rely on Remark 4 and work directly 
with simple graphs instead of multigraphs. More precisely, let SkewedSipser]^^ ^ be analogous to 
SkewedSipser^ ^ with parameters uq (AND gate fan-in), wq (OR gate fan-in), and d but containing 
an extra layer of fan-in 2 AND gates at the bottom connected to a new set of input variables. In 
other words, this is a depth 2d -|- 2 read-once alternating formula with twice the number of input 
variables of our original SkewedSipser formula (each input variable of SkewedSipser becomes an AND 
gate connected to two new fresh variables). Since SkewedSipser^^ ^ can be obtained by restricting 
SkewedSipser]^^ ^ appropriately (i.e. by setting to 1 a single variable in every new pair of variables) 
a lower bound on the circuit complexity of SkewedSipser immediately implies the same lower bound 
for SkewedSipser^. 
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In order to obtain a lower bound via Theorem 2, we need that > uq. This is equivalent 

to having no > , which follows from d>2 (we may assume d>2 since no depth-1 circuit, 

i.e. single AND or OR gate, can compute STCONN{k{n))) and no > since 

no > kt-^ > 

Consequently, we can apply Theorem 2 to SkewedSipser^^ and it follows from our discussion above 
that any depth-d circuit computing SkewedSipserj^^ ^ must have size at least 

In the rest of the proof we translate (3) into a lower bound for STCONN{k{n)). Following the 
explanation given above, we consider the simple graph G(SkewedSipser^) with appropriate param¬ 
eters. Since Uq < k/2, it follows from the same argument used to establish Corollary 3.1 that the 
graph G(SkewedSipser]^^ z) contains an s-to-t path of length at most 2nQ < A: if and only if we 
have SkewedSipser]^^ ^{z) = 1. Because G(SkewedSipser|^^ has no isolated vertices and has no 
edges, it contains at most 2no < 2n' < n vertices by (1) and (2). Thus, a circuit C that computes 
STCONN{k{n)) on undirected simple graphs on n vertices can also be used to compute the formula 
SkewedSipserjj^ and (3) yields that C must have size ^ This completes the hrst part of 

Theorem 1. 

It remains to prove the lower bound for STCONN(k'(n)) with n^/® < k'{n) < n. For this, 
let k{n) n}/^. We have established above that computing STCONN{k{n)) on subgraphs of 
G(SkewedSipser|^^ using depth-d circuits requires size However, a subgraph of 

G(SkewedSipser(^^ contains an s-to-t path of length at most k{n) if and only if it contains 
a path from s to t of length at most k'{n) (Observation 5). Consequently, any circuit C that 
computes STCONN{k'{n)) on general n-vertex graphs can be used to compute STCONN{k{n)) on 
subgraphs of G(SkewedSipser]^^ (by setting some input edges to 0). In particular, C must have 
size 

^Q(fci/d/rf) ^ ^n(ni/5d/d) ^ _ 

This completes the second part of Theorem 1. □ 

Remark 8. It is not hard to see that our reduction in fact also captures other natural graph 
problems such as directed /c-path (“Is there a directed path of length k{n) in G?”) and directed 
/c-cycle (“Is there a directed cycle of length k{n) in G?”), and hence the lower bounds of Theorem 1 
apply to these problems as well. This suggests the possibility of similarly obtaining other lower 
bounds from (variants of) depth hierarchy theorems for Boolean circuits, and we leave this as an 
avenue for further investigation. 

4 The Random Projection 

In this section we define our random projections, which will be crucial in the proof of Theorem 2. 
First, we introduce notation to manipulate the hrst two layers of SkewedSipser^ 

Definition 9. For 2 < u < w, we dehne CNFSipser^ to be the Boolean function computed by the 
following monotone read-once formula: 
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• The top gate is an AND gate and the bottom-layer gates are OR gates. 

• The top AND gate has fanin u. 

• The bottom-layer OR gates all have fan-in -uj^s/ioo^ 

For SkewedSipser^ ^ and each i £ [d-|-1], we write OR^^^ to denote an OR gate that is in the Ath 
level of OR gates away from the input variables and similarly write AND*-^^ to denote an AND gate 
that is in the Ath level of AND gates away from the input variables. So the root of SkewedSipser^ ^ 
is the only OR*-'^"''^^ gate; each AND^^^ gate has u many OR^^^ gates as its inputs; each AND*^^^ gate 
of SkewedSipser^ ^ computes a disjoint copy of CNFSipser^. 

Next we introduce an addressing scheme for gates and variables of SkewedSipser^ 

Addressing scheme. Viewing SkewedSipser^ ^ as a tree (with its leaves being variables and the 
rest being AND, OR gates), we index its nodes (gates or variables) by addresses as follows. The 
root (gate) is indexed by e, the empty string. The j-th. child of a node is indexed by the address of 
its parent concatenated with j. Thus, the variables of SkewedSipser^ ^ are indexed by addresses 

A{d) := ^ [u],bo,... ,bd-i £ [w],bd £ 

Block and section decompositions. We will refer to the set of addresses of variables 

below an AND^^^ gate as a block, and the set of 7i;33/ioo addresses of variables below an OR^^^ gate 
as a section. 

It will be convenient for us to view the set of all variable addresses A{d) as 

A{d) = B{d) X A', where 

B{d) = {{bo,ai,bi... ,ad-i,bd-i)-. a* G [u],bi £ [tc]} and A! = [tt] x 

Here B{d) can be viewed as the set of addresses of the AND^^^ gates of SkewedSipser^ d^ A' can 
be viewed as the set of variable addresses of CNFSipser^ computed by each such gate (following the 
same addressing scheme). 

More formally, for a fixed /3 £ B{d) we call the set of addresses 

A(d,/3) {(/3,r): T G A'} 

a block of A{d)-, these are the addresses of variables below the AND^^^ gate specified by (d. Thus, 
A{d) is the disjoint union of w{uw)‘^~^ many blocks, each of cardinality \A'\ = 

For a fixed jd £ B{d) and a G [u], we call the set of addresses 

A{d,ld, a) {{(d,a,b):h£ [rc^^/ioO]| 

a section of A(d); these are the addresses of variables below the OR^) gate specified by {fd,a). 
Each block A{d,(d) is the disjoint union of u many sections, each of cardinality xt;33/ioo^ 

To summarize, the set of addresses of variables A(d) can be decomposed into w{uw)*^~^ many 
blocks A{d, fd) (corresponding to the AND^^^ gates), fd £ B{d), and each such block can be further 
decomposed into u many sections A{d,fd,a) (corresponding to its u input OR^^^ gates), a G [tt]. 
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Accordingly we also decompose A', the set of variable addresses of CNFSipser^, into sections 

A'(a) = {{a,b):be for a e [u]. 

The following fact is trivial given the definition of CNFSipser^. (Below and subsequently, we 
use “q” to denote a restriction to the variables of CNFSipser and “p” to denote a restriction to the 
variables of SkewedSipser.) 

Fact 4.1. For any a G [u] and restriction g G {0,1,*}"^ that sets all variables in the a-th section 
A'{a) to 0, i.e., Qt = ^ for all r G A\a), we have that CNFSipser^ f = 0. 

Now we define our random projection operator projp(-). 

Definition 10 (Projection operators). Given a restriction p G {0,1, the projection operator 

projp maps a function /; {0,—)■ {0,1} to a function projp(/): {0, —)■ {0,1}, where 

(projp(/))(y) =/(x), where X/ 3 ,^ * 

[PIS,T if/9/3,r e {0,1). 

For convenience, we sometimes write proj(/ \ p) instead of projp(/). 

Remark 11. The following interpretation of the projection operator will come in handy. Given a 
restriction p G {0,1, if / is computed by a circuit C, then projp(/) is computed by a circuit 

C obtained from C by replacing every occurrence of xp^r by yp if pp^r = *j or by pp^r if Pp,T £ {0,1}. 

The crux of our random projection operator projp(-) is then a distribution over restrictions 
{0,1, to the variables {xp^r : (/3, r) G A(d)}, from which p is drawn. To this end, we consider 

the block decomposition B{d) x A' of A{d), and p ^ is obtained by drawing independently, for 
each block jd G B{d), a restriction pp from a distribution over {0,1, to be defined below. 

Definition 12 (Distributions Vu and The distribution T>u = F>u{q) over {0,1, is para¬ 

meterized by a probability g G (0,1). A draw of a restriction g from T>u is generated as follows: 

• With probability q, output g = (i.e. the restriction fixes no variables). 

• Otherwise (with probability 1 — g), we draw a ^ [u] (a random section) and z ^ {0,1} (a 
random bit) independently and uniformly at random, and output g where for each r G A', 

f z if r G A'(a) 

Qt = S 

I 1 — z otherwise. 

Note that in this case g is distributed uniformly among 2u many binary strings in {0,1}"^ . 
These strings are “section-monochromatic”, with u — 1 of the sections taking on entirely the 
same value 1 — z and the one remaining section a taking entirely the other “rare” value z. 

As described above, a draw of p G {0,1, from = vif\q) is obtained by indepen¬ 

dently drawing pp Vu = Buiq) for each block j3 G B{d). 

The following observation about supp(T’i‘^^) will be useful for us: 
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Remark 13. A restriction p G {0,1, is in the support of iff for every block (3 G B{d), 

pp is either {*}"^^ or there exists exactly one section a G [n] such that pp^r = 0 if r G A'(o) and 1 
otherwise, or there exists exactly one section a G [u] such that pp^r = 1 if t G A' {a) and 0 otherwise. 

Therefore, if T is a term of width at most u — 1 such that for all blocks 13 G B(d), the variables 
from block f3 that occur in T all occur with the same sign, then T can be satisfied by a restriction 
in the support of (i.e., T f p = 1 for some p G supp(P^f^)). (Note that this crucially uses the 
fact that T has width at most tt — 1, and in particular does not contain variables from all u sections 
of any block (3. Also note that the inverse of this is not true, e.g., consider T = xp^r A ^xpy with 
r and r' from two different sections.) 


5 Projection Switching Lemma 


Our goal now is to prove the following projection switching lemma for (very) small width DNFs: 


Theorem 14 (Projection Switching Lemma). For 2 < u < w, let F be an r-DNF over the variables 
(/3,t) G A{d), where r <u — 1. Then for all s>l and q G (0,1), we have 


Pr 

(q) 


proj„(F) has decision tree depth > s 


^ / SgruV 


Notice that while F is an r-DNF over formal variables {xp^^: {f3, r) G A(d)}, we will bound the 
decision tree depth of projp(T), a function over the new formal variables {yp: (3 G B{d)}. 

Remark 15. Projections will play a key role in the proof. Consider a term of the form T = 
xp,T xpy for some r / r', and suppose our p from is such that pp^j- = ppy = *. In this case 
we have T \ p = xp^T- xpy, i.e., the term survives the restriction p, but projp(T) = ypA^yp = 0, 
i.e., the term is killed by projp. Our proof will crucially leverage simplifications of this sort. 

Remark 16. The parameters of Theorem 14 are quite delicate in the sense that the statement 
fails to hold for DNFs of width u. To see this, consider SkewedSipser^ ^ with d = 1, a depth-3 
formula that can also be written as a n-DNF. Then by Corollary 6.2 (to be introduced in Section 
6), we have that for p G- T>u\q) with q = y;-669/iooo^ function projp(SkewedSipser.^ ;^) contains 
a u)^^/^^’^-way OR as a subfunction — and hence has decision tree depth at least y;33/ioo — with 
probability 1 — o(l). So while the statement of Theorem 14 holds for [u — l)-DNFs, it does not 
hold for u-DNFs when u = and W —)■ oo. 

Remark 17. We observe that the conclusion of Theorem 14 still holds if the condition “F is an 
r-CNF” replaces “F is an r-DNF.” This can be shown either by a straightforward adaptation of 
our proof, or via a reduction to the DNF case using duality, the invariance of our distribution of 
random projections under the operation of flipping each bit, and the fact that decision tree depth 
does not change when input variables and output value are negated. 


5.1 Canonical decision tree 

Given an r-DNF F over variables {xp^r ■ (13, t ) G A{d)} and a restriction p G {0,1, projp(F) 

is a function over the new variables {yp : f3 G B{d)}. We assume a fixed but arbitrary ordering on 
the terms in F, and the variables within terms. The canonical decision tree CanonicalDT(F, p) that 
computes projp(F) is defined inductively as follows. 
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CanonicalDT {F, p): 

0. If projp(F) = 0 or 1, output 0 or 1, respectively. 

1. Otherwise, let T be the first term in F such that T \ p is non-constant and T \ pp' = 1 for 
some p' G supp(Pl‘^^). We observe that such a term must exist, or the procedure would have 
halted at step 0 above and not reached the current step 1. 

To see this, first note that certainly there must exist a term T' such that T' f p is non-constant 
since otherwise F \ p is constant (and likewise projp(T)). We furthermore claim that among 
these terms T', there must exist one such that T' f p is satisfiable by some p' G supp(2?^f^), i.e. 
T' \ pp' = 1. To prove this, suppose that each of these terms T' satisfies that T' f p is non¬ 
constant and there exists no restriction p' G supp(Pi‘^^) such that T' \ pp' = 1. By Remark 13 
(and our assumption that r < u — 1), T' \ p must contain two literals from the same block 
occurring with opposite signs, i.e., and ^xpy, for some /3 G B{d). In this case, we have 
that projp(r') contains both yp and ^yp and hence projp(r') = 0. But if each such term T' 
has projp(r') = 0, then projp(T) = 0 and the procedure would have halted at step (0). 

2. Define 

p = {/3 G B{d): xp^r or -■ xp^r occurs in T f p for some r} 

Our canonical decision tree will then query variables yp, fd ^ rj exhaustively, i.e., we grow a 
complete binary tree of depth |p|; we will refer to T as the term of this tree. 

3. For every assignment vr G {0,1}^ to variables yp, Id ^ p (equivalently, every path through the 

complete binary tree of depth |p|), we recurse on CanonicalDT(T, p(p i—)• vr)), where we use 
(p I—?• vr) G {0,1, to denote the following restriction: 

(p I—)• tt)p^t = i ^ ^ ^ for all /3 G B{d) and r G A'. (4) 

I * otherwise. 

Proposition 5.1. For every p G {0,1, we have that CanonicalDT(T, p) computes projp(T). 

While CanonicalDT is well defined for all p, we shall mostly be interested in p G supp('Di'^^). 

5.2 Proof of Theorem 14 

Let 

B {p G supp(D|j^^): decision tree depth of CanonicalDT(F, p) > s} 

be the set of bad restrictions. To prove Theorem 14, it suffices to bound [p G B], the 

total weight of B under 'D^\q). Following Razborov’s strategy (see [Bea95] for more details), we 
will construct a map 

0: {0,l,=t=}^('^) X {0,1}" X 

with the following two key properties: 

1. (injection) 6{p) / 0{p') for any two distinct restrictions p,p' G B; 
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2. (weight increase) Let Oi{p) G {0,1,*}"^^'^) denote the first component of 9{p). Then 


Pr[p = gi(p)] 
Pr[p = p] 


for all p € B, 


where T = ((1 — q)/2quY is “large”. 

Assuming such a map 9 exists (below we describe its construction and prove the two properties 
stated above), Theorem 14 follows from a simple combinatorial argument. 

Proof of Theorem 14- Fix a pair O G {0,1}® x {0, and let 


Bo = {peB: {92{p),9s{p))=0}TB, 


where we use 02 (p) and 03 (p) to denote the second and third components of 0(p), respectively. 
Then we have that 


Pr[p gBo]=Y1 Pr[p = p] < (l/T) • ^ Pr[p = 0i(p)] < l/T. 

pSBo pSBo 

Here the first inequality uses (5) and the second inequality uses the property of 0 being an injection: 
we have that 0i(p) / 0i(pO any two distinct p, p' G Bq (recall that 02(p) = 02 (pO and 03 (p) = 
9d,{p'))i and therefore Ylp&Bo ~ ^(p)i] ^ 1- Summing up over all possible O’s, we have 

Pr[p G H] = ^Pr[p G Bq] < 2^ • (2r)^ • {2qu/{l - q)Y = {8qru/{l - g))^ 

O 

and this concludes the proof of Theorem 14. □ 

The rest of the section is organized as follows. We construct the map 0 in Section 5.3. Then we 
show that it is an injection in Section 5.4, by showing that one can decode p from 0(p) uniquely for 
any p € B. Finally we prove the weight increase, i.e., (5) in Section 5.5. 


5.3 Encoding 

Let p G H be a bad restriction. Let n* be the lexicographically first path of length at least s in the 
decision tree CanonicalDT(T, p) (witnessing the badness of p), and vr be its truncation at length s. 
Then 02(p) is defined to be binary(7r) G {0,1}^, the binary representation of tt , i.e., vTi G {0,1} is 
the evaluation of the ith y-variable along vr. 

Recall that CanonicalDT(F’, p) is composed of a collection of complete binary trees, one for each 
recursive call of CanonicalDT. Let Ri, , Rg/ for some 1 < s' < s denote the sequence of complete 
binary trees that vr visits, with Ri sharing the same root as CanonicalDT(T, p) and vr ending in Rg'. 
(Here s' > 1 because s > 1.) We also use R to denote the term of tree Ri, for each i G [s']. 

For each i G [s' — 1], we let 

rji = {/3 G B{d) : yp is queried in tree i?*}, 

and for the special case of f = s', we let 

Vs' = {Y ^ B{d) : yp is queried in tree Rgi before the end of tt }. (6) 
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For each i G [s'], vr induces a binary string G {0,1}''% where for each /3 G r/j is set to be the 
evaluation of yj^ along tt (in tree Ri). Note that Tj is the i-th term processed by CanonicalDT(F,/?) 
along the bad path vr and equivalently, R is the first term processed by 

CanonicalDT(F, p(r/i i-)- • • • {yi-i i-)- 

where {rjj i—)• is a restriction defined as in (4). So Ti is the first term in F such that 

Ti \ p(r/i 

is non-constant and 

Ti \ piyi • • • {rji-i = 1, for some p' G supp(P^‘^^). 

At a high level, 9i{p) and 02,{p) are defined as follows. The third component 
03 (p) = encode(pi) o • • • o encode(? 7 s/) G {0, 

is the concatenation of s' binary strings, where each encode(? 7 i) is a concise representation of pi. 
In particular, we are able to recover pi given both encode(T/j) and Tj. We describe the encoding of 
Pi in Section 5.3.1. For the first component we have 

di{p) = pa^^^ • • • cr*-^ ^ G {0,1, 

where each G {0,1, is a restriction and pa^^^ ■ ■ ■ is their composition (note that each 

of these s'-|-l restrictions, like the overall composition, belongs to {0,1, We define the (T^®)’s 

in Section 5.3.2. 

5.3.1 Encoding pi 

Fix an z G [s'j. Let pj = {/3i,...,/3t} for some t > 1, with /3j’s ordered lexicographically. It follows 
from the definition of pi that every fij appears in Tj, meaning that either or appears in 

Ti for some r G A'. 

Instead of encoding each /3j directly using its binary representation, we use log r bits to encode 
the index of the first xp^. or ^xp^. variable that occurs in Tj. Here logr bits suffice because T has 
at most r variables. Also recall that we fixed an ordering on the variables of each term, so indices of 
variables in T are well defined. We let location(/Ij) denote the logr bits for Pj. We also append it 
with one additional bit to indicate whether is the last element in pi. More formally, we write 

encode(? 7 i) = location(/3i) o 0 o location(/32) o 0 o • • • o location(/3i) o 1 G {0, 

We summarize properties of 9^{p) below: 

Proposition 5.2. Given Osip), can recover uniquely s' and encode(pi),..., encode(ps'). Fur¬ 
thermore, given encode(pj) and R for some i G [s'], one can recover uniquely pi. 
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5.3.2 The restriction 


We now define for a general i G [s']. For ease of notation we define the restriction 


p{l 1) .^(1)^ . . . 1)^ g |Q^ 

Note that = p. Recalling our CanonicalDT algorithm and the definition of Tj as the i-th term 
processed by CanonicalDT(T, p), we have that Tj is the first term in F such that Ti \ is non¬ 

constant and Ti \ p^^~^^p' = 1 for some p' G supp(P^f^). Therefore, we have 

Pi = {P G B{d ): xp^r or ^xp^r occurs in T \ for some r G A'|. 


We define G {0,1, to be an arbitrary restriction (say the lexicographic first under 

the ordering 0 ^ 1 -< *) satisfying the following three properties: 

1. Ti \ ^ 0, and 

2. G supp(Pif^), and 

3. G {0,1}^' for all (3 G pi, and = {*}^' for all jd ^ Pi- 

In words, <7^*^ is the lexicographic first restriction in supp(2?i'^^) that completely fixes blocks (3 G pi, 
leaves all other blocks fi ^ pi free, and fixes the blocks in pi in a way that does not falsify T \ 

For 1 < f < s', we recall that pi contains all blocks with variables occurring in Tj [ and so 

property (1) above can in fact be stated as T \ = 1. (This is not necessarily true for the 

special case of f = s' since ps> may only contain a subset of the blocks with variables occurring in 

T,, [p(^'-i);c.f. (6).) 

We observe that such a restriction (one satisfying all three properties above) must exist. 
As remarked at the start of this subsection, by the definition of Tj there exists a restriction p' G 
supp(Pi'^^) such that Tj [ p(®“^)p' = 1. This along with the fact that is independent across 
blocks implies the existence of a restriction in supp(2?i'^^) that fixes exactly the blocks in pi in a 
way that does not falsify Ti \ 

This finishes the definition of We record the following key properties of 


Proposition 5.3. Ti \ p(® = 1 for 1 < i < s', and Tgi \ p^^' ^ 0. 


Proposition 5.4. For every /3 G pi, 


we have 



whereas G {0,1}"^', and 


Pr .[Q = Pp =Q 

Q*r-Vu{q) 


whereas 


Pr 

Q*r-Vuiq) 


(di 

.Q = <^p\ 


1-9 

2u 


5.4 Decodability 

Lemma 5.5. The map 9: R —)• {0,1, x {0,1}^ x {0,1}^^*°^''+^^ where 

6{p) = ^pcr^^^ • • • (T*-® \ binary(7r), encode(pi) o • • • o encode(ps/)^, (7) 

is an injection. 
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We will prove Lemma 5.5 by describing a decoder that can recover p £ B given 9{p) as in (7). 
Let a = \ Note that s' can be derived from 0^{p). To obtain p, it suffices to recover the 

sets rji, by simply replacing [pa)p^r with * for all /3 G r?i U • • • U ps' and all t £ A'. 

To recover pi’s, we assume inductively that the decoder has recovered the “hybrid” restriction 

^(1)^... [rji-i !-)• ^ and the sets pi,..., pi-i, (8) 

with the base case i = \ being pa^^^ ■ ■ ■ = 6i{p), which is trivially true by assumption. We will 

show below how to decode Tj and pi, and then obtain the next “hybrid” restriction 

pW^h+i) ... ^(^') = p^ri^ ^ • • • {pi ^ 7rW)u(*+i) • • • 

We can recover all s' sets pi,... ,Ps/ after repeating this for s' times. 

The following lemma shows how to recover Ti, given the “hybrid” restriction in (8). 

Proposition 5.6. For 1 < i < s', we have that Ti is the first term in F such that 

Ti f = 1. 


For the special case of i = s', we have that Tgi is the first term in F such that 

Ts' r = 1 


for some p" £ supp(Pi'^^). 

Proof. We first justify the claim for 1 < i < s'. Recall that Tj is the first term in F such that 
Ti \ is non-constant and Tj \ p^^~^^p' = 1 for some restriction p' £ supp(T’i‘^^). This to¬ 

gether with Proposition 5.3 implies that Tj is the first term in F such that Ti \ pb-i)o-(*) = i; as 
£ supp(Xl^’^^), it follows that cannot satisfy any term that occurs before Ti in F. 

For the same reason, Ti remains the first term in F such that Tj [■ I = 1 (since 

) £ supp(Pi'^^) and so is their composition). 

The argument for i = s' is similar. We again recall that Tg/ is the first term in F such that 
Tg/ I" p(^ is non-constant and Tg/ ( ~^^p' = 1 for some restriction p' £ supp(Ili'^^). Since every 

term in T that occurs before Tgi in F is such that T \ p^^'~^'>p' ^ 1 for all p' £ supp(Pi'^^), certainly 
T \ p" ^ 1 for all p" £ supp(T’i'^^) as well. On the other hand, by Proposition 5.3 we 

have that does not falsify Tgi \ and so there must exist p" £ supp(Pi'^^) such that 

Tg! I" p^^'p" = 1. This completes the proof. □ 


With Ti in hand we use encode(r 7 j) to reconstruct pi by Proposition 5.2. We modify the current 
“hybrid” restriction p{pi i—)• • • • {pi-i e-)• as follows: for each fi £ pi, set 

[p{pi • • • {pi-i ^ = 7r^*\ for all r G A'. 


The resulting restriction is p{pi i—?■ ■ ■ ■ {pi tt (*))o-(*+i) ... as desired. 

Starting with pa and repeating this procedure for s' times, we recover all pfs and then p. This 
completes the proof that 0 is an injection. 
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5.5 Weight increase 

Recall that p and pa differ in exactly s many blocks, and furthermore, p is on all these blocks 

whereas pa belongs to {0,1}^ n supp(Pu) on these blocks. 

Lemma 5.7. For any p £ B and pa = Oi{p), we have 

Pr[p = pa] _ -|-r Pr[p = {pa)f)] _ A - g V 

= P>-[e = ml V2 <Ik/' 

which they differ 

Proof. This follows from independence across blocks and Proposition 5.4. □ 

6 Proof of Theorem 2 

In this section we prove our main technical result. Theorem 2, restated below: 

Theorem 2. There is an absolute constant c > 0 such that the following holds. Let d = d{w) 
and u = u{w) satisfy d > 1 and 2 < u < u;33/ioo_ Then for w sufficiently large, any depth-d 
circuit computing the SkewedSipser^ ^ function (recall that this is a formula of depth 2d + 1 over 
n = variables) must have size at least 7 iC-(u/d)_ 

We begin by first observing that the claimed circuit size lower bound is o(n), and hence 

vacuous, if d > u; thus it suffices to prove the claimed bound under the assumption that d < u. We 
make this assumption in the rest of the proof below (see specifically Corollary 6.2). Of course we 
can also assume that d > 2, since depth-1 circuits of any size cannot compute SkewedSipser.,^ In 
the proof we set the parameter q to he q = ^c-^eg/iooo^ 

In Section 6.1 we establish that our target function SkewedSipser^ ^ retains structure with high 
probability under a suitable random projection. In Section 6.2 we repeatedly apply both this result 
and our projection switching lemma to prove Theorem 2. 

6.1 Target preservation 

We start with an easy proposition about what happens to CNFSipser^ under a random restriction 
from 'Du(q). The following is an immediate consequence of Definition 12 and Fact 4.1: 

Proposition 6.1. For q £- we have that 

(CNFSipser^ with probability q 
(CNFSipser.^ t = S 

[0 with probability 1 — q. 

We obtain the following corollary. 

Corollary 6.2. For every 1 < i < d, we have tdaf projp(SkewedSipser„ contains SkewedSipser^ 
as a subfunction with probability at least 0.9 over a random restriction p £- T>f (q). 
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Proof. Recall that p ^ Vu\q) is drawn by independently drawing ■<— Vuiq) for each block /3 G 
B{i). We have that projp(SkewedSipser„ £) contains SkewedSipser^ as a subfunction if the fol¬ 
lowing holds: for each of the gates in SkewedSipser^ £, at least t(;33/ioo Qf yj gates 

(each one corresponding to an independent CNFSipser^ function) that are its children (say at ad¬ 
dresses Pi,, Pw) have pp. G {*}^ ■ 

By Proposition 6.1, for a given OR^^^ gate, the expected number of Pfs beneath it that have 
PPi ^ is qw = y;33i/iooo_ gg ^ multiplicative Chernoff bound shows that at least u;33/ioo 

of the PiS beneath it have pj^. G except with failure probability at most By a 

union bound over the (at most n) OR^^i gates in SkewedSipser^ we have that the overall failure 
probability is at most n ■ Since 


the proof is complete. (In the above we used u < -u;33/ioo ^j^g £j.g£ inequality, d < u for the 
second, u < i(;33/ioo ^gain for the third, and w being sufficiently large for the last.) □ 


6.2 Completing the Proof of Theorem 2 


Most of the proof is devoted to showing that the required size for a depth-d circuit that computes 
SkewedSipser^ ^ is at least 


^ (tef 0.1 
“ ( 16 n 2 g)“-^' 


(9) 


We prove (9) by contradiction; so assume there is a depth-d circuit C of size at most S that 
computes SkewedSipser^ As noted in Section 2.2 we assume that C is alternating and leveled. 

We “get the argument off the ground” by first hitting both SkewedSipser.^ and C with projp(-) 
for p^ T>^\q), where q = ■u;-669/iooo_ ggniark 17, we can apply our projection switching 

lemma. Theorem 14, both to r-DNFs and r-CNFs.) Applying Theorem 14 (with r = 1 and 
s = M — 1) to each of the gates at distance 1 from the inputs in C,^ we have that the resulting 
circuit projp(C') has depth d, bottom fan-in u — 1, and at most S gates at distance at least 2 from 
the inputs^ with failure probability at most S ■ {16qu)'^~^ < 0.1. On the other hand, taking i = d 
in Corollary 6.2 we have that projp(SkewedSipser„ (^) contains SkewedSipser^ as a subfunction 
with failure probability at most 0.1. By a union bound, with probability at least 0.8, a draw of 
p <— vlf\q) satisfies both of the above, and we fix any such restriction G supp(22i'^^(g)). A 
further deterministic “trimming” restriction (by only setting certain variables to 0; note that this 
can only simplify projp(d)(C') further) causes the target projp{d) (SkewedSipser^ ,^) to become exactly 
SkewedSipser^ Let us write to denote the resulting simplified version of the original circuit 
C after the combined “project-and-trim”. As C is supposed to compute SkewedSipser^ Cd must 
compute SkewedSipser^ 

Next, we consider what happens to SkewedSipser.^ and Cd if we hit them both with projp(-) 
for p ^ ^\q)- Applying Theorem 14 (with r = s = ri — 1) to each of the gates at distance 2 

from the inputs and taking a union bound, the resulting circuit projp(C'rf) has depth (d— 1), bottom 


^In this initial application we view C as having an extra layer of gates of fan-in 1 next to the input variables, so 
we have a valid application of Theorem 14 with r = a — 1 > 1. 

^Note that projp(C') may have a large number of gates at distance 1 from the inputs but it suffices for our purpose 
to bound the number of gates at distance at least 2 from the inputs. 
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fan-in u — 1, and at most S gates at distance at least 2 from the inputs with failure probability 
at most S ■ {16ruq)^~^ < S • < 0.1. On the other hand, taking £ = d — 1 we can 

again apply Corollary 6.2 to SkewedSipser^ and we have that projp(SkewedSipser„ contains 
SkewedSipser^ ^_2 as a subfunction with failure probability at most 0.1. Once again by a union 
bound, with probability at least 0.8 a draw of p V^u ^\q) satisfies both of the above, and we 
fix any such restriction e supp(Il^“^(g)). As before we perform a deterministic trimming 

restriction that causes the target proj^{d-i) (SkewedSipser^ to become exactly SkewedSipser^ ^_2 
and we let be the resulting simplified version of after the combined project-and-trim. As 
Cd computes SkewedSipser^ we have that Cd-i must compute SkewedSipser^ ^_ 2 . 

Repeating the argument above, each time taking r = s = M — lin Theorem 14, there exist a se¬ 
quence of restrictions E supp(Pi‘^ ■ ■ ■ ■> £ ^^PP{'^u\q)) and their resulting circuits 

Cd- 2 , ■■■ -I Cl such that 

• Hard function retains structure. For 1 < £ < d — 2, proj^(^) (SkewedSipser„ £) contains 
SkewedSipser.^ as a subfunction, and hence there exists a deterministic trimming restriction 
that results in proj^{o (SkewedSipser^ £) becoming exactly SkewedSipser.^ £_;^. 

• Circuit collapses. For 2 < £ < d — 2, the circuit proj^(<) (C^+i) has depth £, bottom fan-in 
M — 1, and has at most S gates at distance at least 2 from the inputs. Furthermore, Ci is 
the simplified version of proj^(^) (C^+i) after the deterministic trimming restriction associated 
with proj^(o (SkewedSipser.^ £). Finally, the circuit proj^{i) ((72) can be expressed as a depth- 
{u — 1) decision tree, and Ci is the simplified version of proj^(i) ((72) after the deterministic 
trimming restriction associated with proj^{i) (SkewedSipser^ j^). 

The above implies that (7^ computes SkewedSipser.^ £_i for all 1 < £ < d — 2. This yields the de¬ 
sired contradiction since (7i, a decision tree of depth at most u — 1, cannot compute SkewedSipser^ q; 
the OR of t(;33/ioo ^ ^ many variables. Hence any depth-d circuit computing SkewedSipser.j^ must 
have size at least S, where S is the quantity defined in (9). The following calculation showing that 
S = completes the proof of Theorem 2: 

Claim 6.3. S = 


Proof. We first observe that 

d d-\-^ IM j, rlM4__3^W M , , A 

n = u w 100 < rcioo ^loo < t(;''ioo'^ 2 oo'' < w ^ , and hence n^d < w, 
where we used u < for the first inequality and d > 2 for the second. As a result we have 


S = 0.1 


w 


669/1000 \ 


U—1 


16u2 


j 


> 0.1 


'y^9/1000' 

16 


u/2 


> 0.1 


n3d 1000 

16 


u/2 




where we used q = w 669/iooo £qj, equality, 2 < u < for the first inequality, and 

w > n id for the final equality. □ 


Remark 18. We remark that a straightforward construction yields small-depth circuits computing 
SkewedSipsefo ^ that nearly match the lower bound given by Theorem 2. This construction simply 
applies de Morgan’s law to convert a u-way AND of w-way ORs into a w^-way OR of u-way ANDs. 
This is done for all of the AND^'’*\ ... gates in SkewedSipsefo Collapsing 

adjacent layers of gates after this conversion, we obtain a depth-(d -|- 1) circuit of size rpC!^ that 
computes the SkewedSipser^ ^ function. 
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